translated by 谷歌翻译
This paper presents a method for adding multiple tasks to a single deep neural network while avoiding catastrophic forgetting. Inspired by network pruning techniques, we exploit redundancies in large deep networks to free up parameters that can then be employed to learn new tasks. By performing iterative pruning and network re-training, we are able to sequentially "pack" multiple tasks into a single network while ensuring minimal drop in performance and minimal storage overhead. Unlike prior work that uses proxy losses to maintain accuracy on older tasks, we always optimize for the task at hand. We perform extensive experiments on a variety of network architectures and largescale datasets, and observe much better robustness against catastrophic forgetting than prior work. In particular, we are able to add three fine-grained classification tasks to a single ImageNet-trained VGG-16 network and achieve accuracies close to those of separately trained networks for each task. Code available at https://github.com/ arunmallya/packnet
translated by 谷歌翻译
The Flickr30k dataset has become a standard benchmark for sentence-based image description. This paper presents Flickr30k Entities, which augments the 158k captions from Flickr30k with 244k coreference chains, linking mentions of the same entities across different captions for the same image, and associating them with 276k manually annotated bounding boxes. Such annotations are essential for continued progress in automatic image description and grounded language understanding. They enable us to define a new benchmark for localization of textual entity mentions in an image. We present a strong baseline for this task that combines an image-text embedding, detectors for common objects, a color classifier, and a bias towards selecting larger objects. While our baseline rivals in accuracy more complex state-of-the-art models, we show that its gains cannot be easily parlayed into improvements on such tasks as image-sentence retrieval, thus underlining the limitations of current methods and the need for further research.
translated by 谷歌翻译
We introduce a linguistically enhanced combination of pre-training methods for transformers. The pre-training objectives include POS-tagging, synset prediction based on semantic knowledge graphs, and parent prediction based on dependency parse trees. Our approach achieves competitive results on the Natural Language Inference task, compared to the state of the art. Specifically for smaller models, the method results in a significant performance boost, emphasizing the fact that intelligent pre-training can make up for fewer parameters and help building more efficient models. Combining POS-tagging and synset prediction yields the overall best results.
translated by 谷歌翻译
Designing efficient and labor-saving prosthetic hands requires powerful hand gesture recognition algorithms that can achieve high accuracy with limited complexity and latency. In this context, the paper proposes a compact deep learning framework referred to as the CT-HGR, which employs a vision transformer network to conduct hand gesture recognition using highdensity sEMG (HD-sEMG) signals. The attention mechanism in the proposed model identifies similarities among different data segments with a greater capacity for parallel computations and addresses the memory limitation problems while dealing with inputs of large sequence lengths. CT-HGR can be trained from scratch without any need for transfer learning and can simultaneously extract both temporal and spatial features of HD-sEMG data. Additionally, the CT-HGR framework can perform instantaneous recognition using sEMG image spatially composed from HD-sEMG signals. A variant of the CT-HGR is also designed to incorporate microscopic neural drive information in the form of Motor Unit Spike Trains (MUSTs) extracted from HD-sEMG signals using Blind Source Separation (BSS). This variant is combined with its baseline version via a hybrid architecture to evaluate potentials of fusing macroscopic and microscopic neural drive information. The utilized HD-sEMG dataset involves 128 electrodes that collect the signals related to 65 isometric hand gestures of 20 subjects. The proposed CT-HGR framework is applied to 31.25, 62.5, 125, 250 ms window sizes of the above-mentioned dataset utilizing 32, 64, 128 electrode channels. The average accuracy over all the participants using 32 electrodes and a window size of 31.25 ms is 86.23%, which gradually increases till reaching 91.98% for 128 electrodes and a window size of 250 ms. The CT-HGR achieves accuracy of 89.13% for instantaneous recognition based on a single frame of HD-sEMG image.
translated by 谷歌翻译
Mission teams are exposed to the emotional toll of life and death decisions. These are small groups of specially trained people supported by intelligent machines for dealing with stressful environments and scenarios. We developed a composite model for stress monitoring in such teams of human and autonomous machines. This modelling aims to identify the conditions that may contribute to mission failure. The proposed model is composed of three parts: 1) a computational logic part that statically describes the stress states of teammates; 2) a decision part that manifests the mission status at any time; 3) a stress propagation part based on standard Susceptible-Infected-Susceptible (SIS) paradigm. In contrast to the approaches such as agent-based, random-walk and game models, the proposed model combines various mechanisms to satisfy the conditions of stress propagation in small groups. Our core approach involves data structures such as decision tables and decision diagrams. These tools are adaptable to human-machine teaming as well.
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
在本文中,我们研究了视觉和热图像的性能和公平性,并将评估扩展到掩盖的合成图像。使用SeadyFace和Thermal掩码数据集,我们提出了一个过程来评估真实图像的公平性,并显示如何将同一过程应用于合成图像。随机猜测的人口统计差异为1.59,当识别性能提高到99.99 \%时,人口统计学差异为1.59。我们表明,固有的偏见数据集可以深深影响任何生物识别系统的公平性。偏见数据集的主要原因是由于数据收集过程而导致的类不平衡。为了解决不平衡的数据集,可以使用合成图像来增强样品的较少类,以生成更平衡的数据集,从而在训练机器学习系统时产生较小的偏见。对于支持生物特征的系统,公平性至关重要,而相关的公平,多样性和包容性(EDI)的相关概念非常适合生物识别技术公平性的概括,我们专注于3个最常见的人口统计组年龄,性别和种族。
translated by 谷歌翻译
translated by 谷歌翻译